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ASYMPTOTIC EFFICIENCY AND FINITE-SAMPLE PROPERTIES 
OF THE GENERALIZED PROFILING ESTIMATION OF 
PARAMETERS IN ORDINARY DIFFERENTIAL EQUATIONS ^ 

By Xin Qi and Hongyu Zhao 

Yale University 

Ordinary differential equations (ODEs) are commonly used to 
model dynamic behavior of a system. Because many parameters are 
unknown and have to be estimated from the observed data, there 
is growing interest in statistics to develop efficient estimation pro- 
cedures for these parameters. Among the proposed methods in the 
literature, the generalized profiling estimation method developed by 
Ramsay and colleagues is particularly promising for its computational 
efficiency and good performance. In this approach, the ODE solution 
is approximated with a linear combination of basis functions. The co- 
efficients of the basis functions are estimated by a penalized smooth- 
ing procedure with an ODE-defined penalty. However, the statistical 
properties of this procedure are not known. In this paper, we first 
give an upper bound on the uniform norm of the difference between 
the true solutions and their approximations. Then we use this bound 
to prove the consistency and asymptotic normality of this estimation 
procedure. We show that the asymptotic covariance matrix is the 
same as that of the maximum likelihood estimation. Therefore, this 
procedure is asymptotically efficient. For a fixed sample and fixed 
basis functions, we study the limiting behavior of the approxima- 
tion when the smoothing parameter tends to infinity. We propose an 
algorithm to choose the smoothing parameters and a method to com- 
pute the deviation of the spline approximation from solution without 
solving the ODEs. 

1. Introduction. Ordinary differential equations (ODEs) are often used 
to model dynamic processes in engineering, biology and many other areas. 
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For example, the dynamic behavior of gene regulation networks can be mod- 
eled by a set of ODEs (see Gardner et al. [9] and Cao and Zhao [5]). These 
ODEs usually involve many unknown parameters. Ideally, we hope that we 
can estimate these unknown parameters by some classical parametric esti- 
mators, such as least squares estimators or maximum likelihood estimators 
(MLE). However, most nonlinear ODE systems do not have analytical solu- 
tions whereas numerical solutions can be time consuming and it is nontrivial 
to estimate their values from the observed data that are often very noisy. 

Because of the importance of this problem, many methods have been pro- 
posed to estimate parameters in ODEs that cannot be solved analytically. 
One method is through nonlinear least squares (NLS). In this approach, a 
numerical method is used to approximate the solution of the ODEs at a 
given trial set of parameter values and initial conditions. The fitted values 
are input into the nonlinear least squares procedure to update parameter es- 
timates. This NLS approach is computationally intensive since a numerical 
approximation to the solutions is required for each update of the param- 
eters and initial conditions. In addition, the inaccuracy of the numerical 
approximation can be a problem, especially for stiff systems. 

Another approach, called collocation methods, approximates the solution 
by a basis function expansion, such as the cubic spline function. Varah [27] 
suggested a two-stage procedure where the first step fits the observed data 
by least squares using cubic spline functions without considering the ODEs, 
and the second step estimates the parameters by least squares solution of 
the differential equations sampled at a set of points. This approach works 
well for the simple equations considered, but considerable care is required in 
the smoothing step and all the variables in the system need to be measured. 
Ramsay and Silverman [22] and Poyton et al. [20] further developed Varah's 
method by proposing an iterated principal differential analysis, which con- 
verged quickly to the estimates of both the solution and the parameters and 
had substantially improved bias and precision. However, their approach is a 
joint estimation procedure in the sense that it optimizes a single roughness- 
penalized fitting criterion with respect to both the coefficients of the basis 
expansion and the parameters. The effect of the nuisance parameters on the 
fit of the model cannot be controlled. For other collocation methods, see 
Tjoa and Biegler [24], Arora and Biegler [2] and Bock [3]. 

Most recently, Ramsay et al. [21] proposed a new collocation method 
called generalized profiling procedure. In this approach, the ODE solution 
is approximated by a linear combination of basis functions. However, the 
coefficients of the basis functions are estimated by a penalized smoothing 
procedure with an ODE-defined penalty. The smoothing parameter controls 
the trade off between fitting the data with the basis functions and fidelity of 
the basis functions to the ODEs. Their method has several unique aspects. 
The computation load is much lower than other methods because it avoids 
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numerically solving ODEs. It can estimate some ODE components even if 
they are not observed. It is easy to estimate uncertainties in parameter es- 
timates and simulation experiments suggested that there is good agreement 
between estimated uncertainties and actual estimation accuracies. In addi- 
tion, this approach does not require a formulation of the dynamic model as 
an initial value problem in situations where initial values are not available. 

Despite these attractive features, little is known about the statistical prop- 
erties of the estimates from this procedure, such as consistency and asymp- 
totic normality. Furthermore, it is not clear how to choose the smoothing 
parameters automatically. In this article, to study the asymptotic proper- 
ties we firstly derive an upper bound on the uniform norm of the differ- 
ence between the ODE solutions and their approximations in terms of the 
smoothing parameters and the distance between the approximation space 
and the solutions (the distance can be controlled by the knots when the cu- 
bic spline functions are used as approximations). Then this bound is used to 
prove the asymptotic consistency of the parameter estimation if the smooth- 
ing parameter goes to infinity and the distance between the approximation 
space and the space of the ODE solutions goes to zero. If the smoothing pa- 
rameter and the distance satisfy certain conditions on the convergence rate, 
we prove the asymptotic normality for the parameter estimation and show 
that its asymptotic covariance matrix is the same as that of the maximum 
likelihood estimation. Therefore, the profiling procedure is asymptotically 
efficient. We note that our asymptotic results are also true for partially ob- 
served systems (only parts of the components are observed). According to 
these results, we propose an algorithm to choose the smoothing parameters 
automatically. 

One innovative feature of the profiling procedure is that it incorporates a 
penalty term to estimate the coefficients in the first step. This penalty is the 
L^-norm of the difference between the two sides of ODEs which are evaluated 
by plugging in the approximation functions. From the theory of differential 
equations, for such penalty (even the L°°-norm), the bound on the difference 
between the approximations and the solutions will grow exponentially how- 
ever small the penalty is. As a result, if the time interval is large, the bound 
will be too large to be useful. However, the results in Ramsay et al. [21] and 
our simulation studies indicate that when the smoothing parameter becomes 
large, the approximations to the solutions are very good. There is no trend 
of exponentially growing. To explain this phenomena, we fix the sample and 
the approximation space, and study the limiting situation as the smoothing 
parameter goes to infinity. We show that any such sequence will have a sub- 
sequence converging to one of the minimum functions of the penalty in the 
approximation space. We study the properties of these minimum functions 
and give a bound on the uniform norms of the differences between these 
functions and the solutions in the one-dimensional case and the B-spline 
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bases. The bound depends almost linearly on the length of the time interval 
or almost does not depend on length of the time interval if we put stronger 
conditions. This result explains the above noted phenomena and motivates 
us a method to compute the deviation of the spline approximation from 
solutions without solving the ODEs. 

Olhede [19] outlined some asymptotic results for the profiling procedure 
proposed by Ramsay and colleagues. In order to achieve asymptotic consis- 
tency, Olhede [19] took the number of the B-spline basis functions to be of 
order 0{n) where n is the sample size. Then she imposted the conditions that 
the penalties have order 0{n~^) for some 6 > and the sum of the scaled 
penalties by smoothing parameters has order 0{n). It was derived that the 
smoothing parameters have order 0{'n}~^^). However, it is not easy to tune 
these parameters to ensure that the penalties satisfy the imposed conditions, 
because only the number of the bases and the smoothing parameters can be 
tuned and the values of the penalties depend on these two sets of tuning 
parameters through complex relationships. Therefore, it is better to impose 
conditions only on the tuning parameters. Furthermore, although we obtain 
the approximations to the solutions by choosing a set of basis functions and 
computing the coefficients by solving a penalized optimization problem, the 
approximations only depend on the linear space spanned by the bases. In 
fact, if we choose another set of bases in the same space, we should get the 
same approximations. Hence, an appropriate theory should put conditions 
on the approximation space instead of the bases. In her discussion, Olhede 
[19] proposed to use L°°-penalty instead of or L^-penalty. Our asymptotic 
results hold for all these penalties, although the smoothing parameters take 
different convergence rates for different norms. Unfortunately, the bound 
that is exponentially increasing in time in Theorem 3.1 will stay for all the 
penalties. The results for fixed sample where the smoothing parameters con- 
verge to infinity can only be shown for L^-penalty because there we use the 
special property of L^-norm that we can change the integrals for time and 
the differentiation with respect to the parameters under this norm. 

Lele [16] raised the question as to what kind of asymptotics is appropri- 
ate: infill asymptotics, or increasing domain asymptotics, or both. Here we 
choose the infill asymptotics for the following reasons. First, one key point 
in our proof is the uniform boundedness of the solutions to ODEs on a finite 
time interval for all the parameters in a compact subset of the parameter 
space. In order to do that, we make assumptions about the existence of the 
solutions and the smoothness of the functions in ODEs. However, the exis- 
tence does not always hold or is not easy to check for nonlinear ODEs [see 
Remark 2(1) after Assumption 2]. If we choose a fixed time interval, for ex- 
ample, the largest sampled time as the endpoint of the interval, there exists 
at least a neighborhood of the true parameters such that the solutions ex- 
ist for the parameters in this neighborhood. However, these conditions and 
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assumptions could be seriously violated for increasing domain asymptotics. 
Second, the difference between the basis function approximations and the 
true solutions could increase exponentially with time (see Theorem 3.1 and 
the remarks after it). This will affect the accuracy of the estimate. Last, if we 
are only interested in estimating the parameters, it is adequate to sample an 
increasing number of points in a finite interval under the assumptions that 
the model is correct, the functions in ODEs are smooth, and the parameters 
are identifiable. 

Now, we describe the model and the profiling procedure in detail. Let the 
parameter space be an open and convex subset of W^. We use 6*0 to denote 
the true parameter. Consider the following ODEs: 

^(t) = F(x(t),z(t),t,0), 

(LI) ^(t) = G(x(t),z(t),t,0), 

x(0)=xo, z{0) = zo, 

on time interval [0, T] with x : [0, T]^W^^, z: [0, Tj^W^^ F: W^^ x W^^ x 
[0, TjxQ^R'^^, and G : M"^! x M'^^ x [0, T] x G M.'^^ . F and G have known 
functional forms with some unknown parameters, and xq and zq are the 
initial values in W^^ and M'^^. Suppose that the initial values can be chosen 
from a convex open region F G M'^^ x M'^^. We assume that for each 9, the 
initial value problem (1.1) has a unique solution (x{6,t),z(9,t)) on [0,r]. 
We use the bold face letters to denote the functions on [0,T]. 

The following is a concrete example from Ramsay et al. [21]. Consider the 
FitzHugh-Nagumo equations which describe the behavior of spike potentials 
in the giant axon of squid neurons: 

f^V,, ^\ 
_(t)=c(v-- + R), 

where V is the voltage across an axon membrane and R is a recovery variable 
summarizing outward currents. The parameters are 6 = (a, b, c), and the time 
interval is [0, 20]. 

Assuming the underlying model (1.1), suppose that (Yi,ri), . . . , (y„,T„,) 
are i.i.d. observations, where the Tj is the sample time and Yi is the ob- 
served data at time Tj. Tj's are independent random variables distributed 
on interval [0,T] with distribution Q and l^'s take the values in M'^^. To fit 
a model of form (1.1) to the observed data suppose we have the following 
data fitting criterion 

1 " 

if„(x(0,-)) = --V5(y^,x(^,TO), 
n ^-^ 

i=l 
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where g{y, x) is a function on M x M ^ and Hn is a functional on the space 
of the function on [0,T] such that for any function x(t), 



1 " 

Hn{^) =—Y,a{yrMm. 

n ^ — ' 



n 
1=1 

Here, we assume that only part of the systems are observable, e.g., l^'s 
only depend on the first di components of the solution. The following are 
two such examples. 

Example 1 (Nonlinear least squares). Suppose that di = 1 and 

y, = x(eo,7i) + e„ 

where = l,...,n} are independent random variables with the same 
distributions A^(0, cr^). Here we take g{y,x) = {y — x)"^ . 

Example 2 (Logistic regression). Suppose that di = l and the condi- 
tional distribution Yi\Ti = t \s Bernoulli with success probability 

ex(eo,t) 

Hence, 

g{y,x) =log(l + e^) -xy. 

For simplicity, we shall restrict ourselves to the case di = d2 = = 1. 
First, we introduce some function spaces and some norms in those spaces. 
Consider the space of continuously differentiable functions 

C^([0,T]) ={/:both / and /' are continuous functions on [0, T]} 

and the space of square integrable functions 

L^([0,r]) = <|/:/ is a measurable function on [0,r] and j |/(t)p < oo|>. 

We mainly consider the functions in the space C^[0,T]. For any / G 
C^[0,r], define 

oo = sup {|/(t)|}, 
te[o,T] 

-1 1/2 

\f{t)?dt . 



L2([0,T]) - 

We have two inequalities for these two norms which we will use below. 

L2([0,T]) - ^II/IIto 



(1-2) ll/llkfo,Tn<r"'"' 
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and by Holder inequality, 



fis)ds 



T 



we have 
(1.3) 



<t \f{s)\'ds<T \f{s)\'ds VtG[0,r], 



f{s)ds 



<r|l/(«)lli^([0,T])> 



where || Jq /(s) ds\\oo means the norm of the function hit) = /(s) ds, < 
t<T. 

In this paper, we consider the following penalty, for any 9 and x, z £ 

cMo,T], 



J(x,z,0) 



T 



-{t)-F{^{t),zit),t,9) 



1 2 



(1.4) 



+ 



^(t)-G(x(t),z(t),t,e) 



(ix 



F(x,z,t,0) 



+ 



L2[0,T] 



dz 



(it 



G{x,z,t,e) 



L2 [0,T] 



Note that J was used to denote the whole penalized log likelihood criterion 
in Ramsay et al. [21] and this is different from our definition. We use to 
denote the empirical measure. For example, 



1 " 



1=1 



Suppose that {L„,n > 1} is a sequence of finite-dimensional subspaces of 
C^[0,r]. We will use the functions in L„ to approximate the solutions of 
ODEs. For example, we can choose L„ to be the space of cubic spline func- 
tions with knots r^") = (0 = t^"'' < • • • < = T). Define 



: max \ti — ti-i\. 

2<i<k„ 



Let |r^")| — )• as n — )• oo. 

Sometimes the initial values of the systems are unknown. In this case, we 
have to regard the initial values as nuisance parameters. Define 6* = (6, x, z), 
the combination of the parameters and the initial values. Let 6q be the 
combination of the true parameters and the true initial values. We rewrite 
the solutions of (1.1) as 



(x(r,t),z(r,t)). 
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In the next section, we describe the profihng procedure in details and pro- 
pose a method of the choice of the smoothing parameter. In Section 3, the 
consistency and the asymptotic efficiency are given. In Section 4, we study 
the hmit behavior of the basis function approximations to the true solutions 
to ODEs as the smoothing parameter goes to infinity. The proofs of these 
results are given in the last section. 

2. The generalized profiling procedure and the choice of the smoothing 
parameters. In the generalized profiling procedure proposed in Ramsay et 
al. [21], a finite-dimensional space L„ of functions in [0,T] is chosen firstly 
(this is equivalent to choosing a set of basis functions). Approximations func- 
tions to the solutions of the ODEs will be chosen from L„ (this is equivalent 
to choosing the coefficients of the basis functions) . One innovative feature of 
this procedure is that the approximations are chosen by solving the following 
penalized optimization problem: for any 6* = {9,x,z) gQ xT, 

maximize -ffn(x) — AJ(x, z, 6), 

(2-1) 

subject to X G L„, z G L„, x(0) = x, z(0) = z, 

where the penalty J regularizes the approximations by controlling the size 
of the extent that the approximation functions fail to satisfy the ODEs 
and the A is the smoothing parameter which controls the amount of reg- 
ularization. In this paper, to simplify notation, we choose the same A for 
all the components in the penalty. Ramsay et al. [21] allowed the different 
smoothing parameters for different components. The asymptotic results in 
this paper can be easily extended to this latter case that the smoothing 
parameters take different values if all the smoothing parameters have the 
same asymptotic orders. The existence of global solutions to (2.1) will be 
discussed below in this section. Here, we assume that the global solutions 
exist. Let {x{6* , X,t),z{6* , X,t)) be solutions to (2.1). They depend on both 
A and 6* . Then we plug them to the functional Hn- The estimates 9*{X) are 
obtained by maximizing if„(x(0*, A, •)) with respect to 9*. Small A makes 
both the optimization problems of (2.1) and maximizing Hn{x(9* , X, ■)) ro- 
bust with respect to the poor initial guesses, but gives bad approximations 
to the solutions. On the other hand, large A gives good approximations, but 
the optimization problems are sensitive to the initial values. 

In Theorem 3.1 below, the uniform norm of the differences between the 
exact solutions and the basis function approximations are bounded by a 
sum of two terms, Op(l/\/A) and 0(r-„), where r„ is some kind of distance 
between L„ and the solutions to ODEs. r„ does not depend on A. When A 
becomes very large, the bound will be dominated by the second term. In this 
case, it is useless to increase the A. We can use these to explain the patterns 
of Figure 6 in Ramsay et al. [21] and Figure 15 in Huang [12]. Those pictures 
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have different knots and observations. But they have the similar patterns. 
When we increase the A, at first the parameter estimates become better 
(both of the bias and the variance). But after some point, increasing the A 
has little effect on the parameter estimates. 

The above discussion suggests that we should initially choose a small 
A, then we increase the A until the parameter estimates become stable. In 
Section 2.8.1 in Ramsay et al. [21], the authors proposed to stop increas- 
ing A when the norm of the difference between the solutions to ODEs and 
the approximations begins increase after obtaining a minimum. Because the 
penalties are nonlinear functionals of the approximation functions, the ap- 
proximation functions may depend on A in a complex way. There may exist 
many local minima before A becomes large enough. Hence, it may be better 
to design the stopping rule according to the performance of the parame- 
ter estimates. One of the major advances of this paper is the fact that it 
provides accurate estimations of the confidence intervals of the estimated 
parameters. We can compare the confidence intervals for different A to de- 
cide whether we increase A or stop. But computing confidence intervals is 
time consuming. We can start calculating the confidence intervals when A is 
moderately large. We propose the following algorithm: 

(1) Choose the space L„, a moderately large positive number Aq and a small 
number a. 

(2) Choose a small initial value for A, and a initial guess for Q* . 

(3) Obtain the estimates Q*{X) by maximizing Hn{x{9* , X, ■)) with respect 
to 6*, where for each 6* = {9,x, z) G G x T, 

(x(r,A,t),z(r,A,t)) G argmax [i7„(x) - AJ(x, z, 6*)]. 

x,zeL„ 

x(0)=x,z(0)=z 

(4) If A < Aq, set 9*{\) to be the initial value for next iteration and let 
A = 10 X A. Go to step (3). 

If A > Aq, calculate the confidence intervals for the parameter estima- 
tions. Compare the confidence intervals with those for previous value of 
A (if they exist). 

— If the ratios of the overlaps to both of the intervals are larger than 
1 — a, stop and go to step (5). 

— Otherwise, set 6*{\) to be the initial value for next iteration and let 
A = 10 X A, then go to step (3). 

(5) Set A„ = A and 0* = 9*{X). A„ is the final choice of the smoothing pa- 
rameter and 0* is the profiling estimators for the unknown parameters. 
Set x„(r,t) = x(r,A„,t) and z„(r , t) = z(r , A„, t). 
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Although the penahzed optimization problem (2.1) is solved in the finite- 
dimensional space L„, the existence of global solutions to (2.1) cannot be 
easily verified due to the nonlinearity of J. However, if we use the norm 
II • lloo in L„,, then under our main assumptions, both J(x,z,0) and -ffn(x) 
are continuous functions of (x,z). In this case, the local solutions always 
exist if we solve (2.1) in any bounded and closed subset of (L, || • ||oo)- In this 
paper, one important assumption is that the 0* are uniformly tight, that is, 
given any probability arbitrarily close to 1, one can find a compact subset 
of the parameter space such that all the 0* belong to that compact subset 
with this high probability. The solutions to ODEs for any parameter in that 
compact subset are uniformly bounded by a positive number, say K. In this 
case, if we solve (2.1) in the subset of the functions with norms less than or 
equal to K + \, then the solutions exist and all proofs of the main results 
are still true by using these local solutions instead of the global solutions. 
In practice, one can only search a bounded region of the parameter space, 
so the estimates are uniformly compact. One can solve (2.1) in a large but 
bounded subset of L„. 

The above algorithm is based on the assumption that the dynamic mod- 
els are correctly specified. The algorithm in Section 2.8.2 in Ramsay et al. 
[21] may produce better approximations to the solutions to ODEs for mis- 
specified models. In this case, A may play a slightly different role. lonides 
[13] discussed the problem of model misspecification, especially when there 
are noises in both measurement and dynamics. In this case, it is difficult to 
define the true trajectories. There are some alternative procedures to deal 
with this problem, for example, the iterated filtering (see lonides, Breto and 
King [14] and Breto et al. [4]) in the frequentist domain or the Bayesian 
sequential Monte Carlo (see Liu and West [17]) in the Bayesian domain. 

3. Consistency and asymptotic normality. We now state our main as- 
sumptions. 

Assumption 1. Q has a density f[t) with respect to Lebesgue measure 
on [0,r] and c< f(t) < C for all t £ [0,T], where c and C are two positive 
numbers. 

Remark 1. This assumption guarantees that the samples can be taken 
anywhere and will not be over concentrated on a subset of the time interval. 

Assumption 2. F,G e C^{R x M x [0,T] x 0). For each 6* e G and 
each pair of initial values {x,z) G F C M x M, there exists a unique solu- 
tion (x(r,-),z(r,-)) of the (1.1) on [0,T] and for any ^ = (0, x, z) / = 
i9',x',z'), we have x(r , •) / x(r', •) 



GENERAL PROFILING ASYMPTOTICS 



11 



Remark 2. 

(1) Here, for developing our theory, we assume that the solutions exist for 
all the parameters in the parameter space in [0,T]. One key point in 
our paper is that for any compact subset in the parameter space, the 
solutions for the parameters in this subset are uniformly bounded in 
[0,T], so the existence of the solutions for all parameters in [0,T] is 
necessary for our analysis. But it is not always true nor easy to check 
this assumption for nonlinear ordinary differential equations. Here is a 
simple example, suppose that = (0, oo) and consider the ODE 

^ = i + ex^ x{o) = o. 

For fixed 9, the solution only exists on [0, ^j^] and 

x{t)= tan VOt VtG 

Hence, for any T, if > (^)^, there does not exist any solution in [0,T] 
with x{0) = 0. 

In practice, we usually let T be equal to the largest sampled time 
point. Hence, at least for the true parameter, the solutions exist in [0,r]. 
Since here we assume the smoothness of F and G, by Theorem 7.4 in 
Chapter 1 of Coddington and Levinson [7] , there are a neighborhood of 
the true initial value and a neighborhood of the true parameter such that 
the solutions exist in [0,T] for the initial values and the parameters in 
those neighborhoods. If the estimates belong to that neighborhood, all of 
our results still hold even without the assumption about the existence 
of the solutions in the whole parameter space. For any parameter for 
which the solutions do not exist, according to the extension theorem of 
the solutions to ODEs (see Theorem 3.1 in Hartman [11] or Theorem 
1.186 in Chicone [6]), there is a < T' < T (T' is parameter dependent) 
such that the solutions exist in [0,r') and the solutions will become 
unbounded when t — )• T' . So one can see that when the sample size and 
the smoothing parameter A are large enough, that parameter cannot be 
our estimate. But our results may not be true in the situation that our 
estimates take the values for which the solution do not exists on [0,T] 
since the solutions are unbounded. 

(2) The uniqueness assumption is necessary in our theory. In general, it is 
not easy to check this assumption for a given ODE and a time interval. 
However, under the assumption of F, G e C^(]R x M x [0,r] x &), the 
existence of the solutions of the initial value problems (1.1) in [0,T] is 
sufficient to guarantee the uniqueness of the solutions in [0,T]. 
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(3) The latter requirement in Assumption 2 means that the parameter esti- 
mation problem is identifiable. The model identifiability should be care- 
fully studied before any statistical inference can be made. There is a 
substantial literature on this issue, for example, Xia [28], Xia and Moog 
[29], Jeffrey and Xia [15] and Miao et al. [18] investigated the identifi- 
ability of HIV dynamic models. There remain many unsolved problems 
in this area, but they are beyond the scope of this paper. 



Lemma 1. Under Assumption 2, there exist a sequence of finite- dimensional 
subspaces {L„,?i > 1} of C^[0,T] such that for any compact subset ©q of @ 
and any compact subset Tq ofT, we have 



lim sup inf 

"^°°6i*G0oxroweU,w(o)=x 



X 



■ w 



lim sup inf 

n^oo e.gOpxTo veL„,v(0)=2 



|z(r,-)-v| 



V 



V 



dx 

dt ^ 

dz 

di ' 



dw 
'dF 

dv 

" dt 



0, 



0, 



where 
b. 



{9,x,z) and a\/b denotes max{a,b) for any real numbers a and 



Proof. Note that in a Euclidean space, a compact subset is just a 
bounded closed subset. Let L„ be the space of cubic spline functions with 
knots r(") = (0 = 4"^ < • • • < = T). Suppose that |r(")| ^ as n ^ oo. 



Under Assumption 2, F,G have the continuous partial derivatives of third 
order. Hence, ^{9*,t) and ^{6*,t) are continuous functions of {9*,t). 
Because ©o x Fq is a compact set, we have 

d^x,^. 



sup 

?*60oxro 



sup 
"eeoxTo 



dt4 

d^z 

dt^' 



< oo, 



< oo. 



By Theorems 2 and 4 in Hall and Meyer [10], 



sup inf ||x(&*, •) — w| 

•eeoxro w6L„,w(0)=a; 



< Co sup 
e*eeoxro 



sup inf 

e'eeoxTo w6L„,w(o)=x 



dSc 
dF 

dx 



dt 



< Ci sup 

^♦eeoxro 



dSc 
'dF 



{9* 



dw 

'dt 
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where Co = -^,Ci = Similarly, we can prove the result for z{6*,-). 

□ 

Let Pe* be the joint distribution of {(Yi,Ti), . . . , (y„,r„)}, which corre- 
sponds to the true parameters and the true initial values. Define function 

M{e*) = -Ee*[g{YiMO*,T^))], e*e@xT, 

where Eg* [■] is the expectation with respect to Pg* . 

Assumption 3. In x r, M{6*) is continuous and has a unique maxi- 
mum at 9q. 

Remark 3. Under Assumptions 1 and 2, both Examples 1 and 2 satisfy 
Assumption 3. Actually in Example 1, 

M{9*) = -Eg*[g{Y,,^{e\X,))] 

= -Ee*m-^{e*,Ti)f] = -a^ - f {-^{9* ,t) - ^{ei,t)fQ{dt) 

Jo 

= M{ei) - r {-^{e\t)-^{ei,t)fQ{dt) 

Jo 

rT 

<M{ei)-c {jc{e*,t) -x{9*Q,t)fdt. 
Jo 

By Assumption 2, /q^(x(6i*, t) - x(6'o, t))^ dt = if and only if 9* = 6^, hence 
9q is the unique maximizer of M{9*). For Example 2, the conditional prob- 
ability 

Ee*[giY,,^i9*,Ti))\Ti] 

= -p{9l,Ti)\ogp{9\Ti)-{l-p{9l,T,))\og{l-p{9\Ti)), 

where 

gx(6l*,i) 

Because for any fixed number a G (0, 1), the function 

alogx + {\ — a) log (1 — x) 
obtains its unique maximum at a in (0, 1), 
M{9*) = -Eg*[g{Y,,^{9\Ti))] 

= Cwi.t) logp(r , + (1 - p{9lt))\og (1 - p{9*,t))]Q{dt) 
Jo 

= Cwht) \ogp{9*,t) + (1 - p{9lt))\og (1 - p{9*mf{t) dt 
Jo 
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obtains its maximum if and only if 9* = 6q. 



Assumption 4. g{y,x) is a nonnegative function and belongs to C(M x 
M). If the Yi is not a bounded random variable, we assume that for any 
compact set A C M, 

' I + mix(zA g{y,x) 



(3.1) 



lim inf 



>0. 



Remark 4. The latter statement means that for any two fixed points 
X, x' in a compact subset, g{y,x) and g{y,x') are comparable as functions of 
y when |y| — )• oo. Usually if g{y,x) does not increase too fast as \y\ — )• oo, the 
above condition is satisfied. For example, if g{y,x) is a polynomial function 
of y given x and the coefficient of the highest order term is nonzero for all x, 
then (3.1) is true. Hence, Example 1 satisfies this assumption. In Example 
2, Yi is bounded. 



Given a compact subset Go of Q and a compact subset Tq of F. Let 



r„ = max-^ sup inf 

.6»*e0oxro weL„,w(0)=a; 



x 



w 



(3.2) 



V 



(ix 



sup inf 

9*eGoxro v6L„,v(0)= 



dw 

•)-v||oo V 



oo 

dz 

IE 



(r, 



where 6* = {9,x,z). 



Theorem 3.1. Under Assumptions 1-4, suppose that A„ — t- oo and rn — s- 
as oo. Then for any compact subset ©o of and any compact subset 
To ofT, 



(3.3) 



sup ||x„(r,-)-x(r,-) 

6»*eeoxro 



< 



1 



T + 2rV8(8A'2 + 2)r 



^2KT 



where K is a constant depending only on Qq x Tq, F and G. 



Remark 5. 

(1) There is an exponential function of T in the above upper bound. This 
is because we use the penalty (1.4). According to the approximation 
theory for ODEs (see Antosiewicz [1]), for general ODEs, however small 
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the approximations make such penalty (even we use the L°° -norms in the 
penalty), the difference between the approximations and the solutions 
could grow exponentially with T increasing. Here is a simple example. 
Consider the equation, 

^ = y, y{o) = yo. 

The solution is y{t) = yoe*. Now suppose we have an approximation y{t) 
to the solution satisfying 

^(t)-y(t) = e, t>0, 
where e is any fixed small number. Then we can get that 
y(i)=yoe* + e(e*-l), t>0, 

so 

\y{t)-y{t)\ = \e\{e' -1), t>0. 

The bound will increase very fast when T is increasing. In the simulated 
data examples for FitzHugh-Nagumo equations in Ramsay et al. [21], 
they took T = 20. In this case, the above bound is too large to be useful 
for their sample size. However, the results in Ramsay et al. [21] and our 
simulation study indicate that when the smoothing parameter becomes 
large, the approximations to the solutions are very good. We will study 
this problem in the next section. Since T is fixed, we can use this bound 
to get the asymptotic consistency and normality. 

(2) If L„ is the space of cubic spline functions with knots r^"^ by the proof 
of Lemma 1, we have 

(3.4) r„ = 0(|T(")|='). 

(3) The upper bound is the sum of two terms. The second term is a function 
of the distance between the approximation space and the true solutions. 
It does not depend on the sample and A. This error term is due to the 
imperfect approximations of the basis expansion. The first term is due 
to using finite A. If A is finite, the solutions to the penalized optimization 
problem (2.1) are affected by the sample, there are discrepancies between 
the solutions for finite A and the minimizer of J (the solutions to the 
penalized optimization problem for A— )• oo). 

(4) In practice, numerical methods have to be used to calculate the penal- 
ties, which may lead to an increase in deviation of the approximations 
from the solutions. For example, Simpson's rule is used to compute the 
penalties in Ramsay et al. [21]. According to the error bound for Simp- 
son's rule and the proof of Theorem 3.1, one more term should be added 
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to the right-hand side of (3.3), k\f\^/'^^/Te^^'^ , where K is a positive 
number depending on the derivatives of Xn{0*,t) up to order 5 and f 
is the partition of [0,r] for Simpson's rule. If the solutions are smooth 
enough and the knots for B-splines are included in the partition points 
for Simpson's method, the error can be well controlled by adding enough 
partition points. 

In order to prove the consistency of the estimation 0*, we replace As- 
sumption 4 by a stronger assumption. 

Assumption 5. g{y, x) is a nonnegative function and belongs to C"^(M x 
M) with 



< oo. 



If Yi is not a bounded random variables, we assume that for any compact 
set A C M, 



lim inf 

\y\~foo 



1 + inf^sAff(j/,x) 



> and 



lim inf 

\y\^oo 



I + mi^^A\dg/dx{y,x)\ 
sr^PxeA\dg/dx{y,x)\ 



>0. 



Remark 6. Both Examples 1 and 2 satisfy this assumption. 



Theorem 3.2. Suppose that Assumptions 1, 2, 3 and 5 hold and that 
0* is uniformly tight. Suppose that — )• oo and r„ — )• as n ^ oo, then 0* 
is consistent. 



Remark 7. 

(1) If L„ is the space of cubic spline functions with knots t^^\ By (3.4), we 
only need A„ — t- cxo and |r^"^| — t- to obtain the consistency result. 

(2) We say 0* is uniformly tight if for any e > 0, there exists a compact set 
e* C e X r such that 

supp(^:^e:)<e. 

n 

This is equivalent to stating that the probability of 6n going to the 
boundary or infinity is zero. The tightness assumption is essential for 
our proofs of the consistency and asymptotic normality. First, under this 
assumption, with a probability arbitrarily close to 1, the solutions can 
be uniformly bounded for any n. The weak convergence of the estimates 
also needs this assumption. If the parameter space @ and the region T 
where the initial values of ODEs are taken are bounded and closed, then 
^* is automatically uniformly tight. For the general case, it is not easy 
to verify this assumption. Sometimes if F, G go to infinity when 6 goes 
to infinity or the boundary of G, 0* is uniformly tight. 
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(3) The proofs of the consistency and the asymptotic normahty are based on 
the results for M-estimators. Because ^* is the maximizer of -ff„(x„(^*, •)), 
it is natural to use ff„(x„(0*, •)) as our criterion function. But x„ is the 
solution of a penalized optimization problem, so it is an implicit func- 
tion of 9*. The derivatives of x„ with respect to 9* have complicated 
forms and are difficult to analyze. Because we have obtained the upper 
bound for the difference between x„ and x„, we can choose the criterion 
function Hn{xni9*,-)) which can be more easily handled. Although 0^ 
is not the maximizer of Hn{xn{9* , ■)) , we can control the difference be- 
tween Hn{xn{6n,-)) and maxg* i?„(x„(0*, •)). In other words, 0* nearly 
maximizes ff„(x„(0*, •)) (see Section 5.2 of van der Vaart [25]), so we 
can apply the results for M-estimators. 

In order to prove the asymptotic normality of the estimator 0* , we replace 
Assumptions 4 and 5 by a stronger assumption. 

Assumption 6. g{y, x) is a nonnegative function and belongs to C^(M x 
M) with 



Eft* 



dx 



< oo and Eo 



{Yi,^{9l,T,)) 



< oo. 



If the Yi is not a bounded random variable, we assume that for any compact 
set A C M, 



lim inf 

|j/|-5>00 



l + \ni^(.Kg{y,x) 



>0, 



lim inf 

lyl-s^oo 



1 -Unf^sA \dg/dx{y,x)\ 
sy^PxeA\dg/dx{y,x)\ 



and 



lim inf 



l + M^^A\d'^g/dx'^{y,x)\ 



>0. 



SuPj,6A|525/ax2(y,2;)| 
Remark 8. Both Examples 1 and 2 satisfy this assumption. 



>0 



Let the estimator ^* be the maximizer of Hn{x{9* ,t)). Note it is different 
from 0*. If g{y,x{6,t)) is the log density function of {Yi,Ti), then ^* is the 
maximum likelihood estimator. 



Theorem 3.3. Suppose that Assumptions 1, 2, 3 and 6 hold and that 
, and ^* are uniformly tight. Suppose that 

oo and r„ = Or, — as oo, 
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and that the matrix 



is nonsingular. Then both ^/n{9*^ — Oq) and y/n{9*^ — Oq) are asymptotically 
normal with mean zero and the same asymptotic covariance matrix. 



Remark 9. 

(1) If L„ is the space of cubic spline functions with knots r^"). By (3.4), let 

^ — )• oo and |r^"^| = o„ ( ^ ,„ ) as n — )• oo. 

Then the conditions on and r„ in Theorem 3.3 are satisfied. 

(2) If g{y,x{9,t)) is the log density function of {Yi,Ti), then ^* is just the 
maximum likelihood estimation. Therefore, 0* is asymptotically effi- 
cient. 

(3) The uniform tightness of 9^ is not needed in the proof of the asymptotic 
normality of 0*. 



4. The properties of the basis function approximations when A — )• cx). 

In this section, we study the finite sample behavior of the approximations of 
the solutions of (1.1) for a given sample and a given value of 9* . In Theorem 
3.1, we provide a bound on the uniform norm of the difference between the 
approximations and the solutions. But this bound will grow exponentially 
with T increasing due to the form of the penalty, which makes the bound 
useless in the finite-sample situation. It seems that the bound cannot be 
improved for general ODEs when the smoothing parameter is finite. How- 
ever, the results in Ramsay et al. [21] and our simulation study indicate 
that when the smoothing parameter becomes large, the approximations to 
the solutions are quite good. Therefore, we let the smoothing parameter A 
go to infinity and study the limiting behavior of the approximations. 

First we consider a simulated example. Consider the FitzHugh-Nagumo 
equations 

a + bR). 



(4.1) 



dY , , 
dU, , 



c V- 
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The parameters in the system are 9 = (a, b, c) and the time interval is [0, 20] . 
Suppose that the true parameter values are Oq = (0.2, 0.2, 3) and the esti- 
mates are ^ = (0.8, -0.5, 3.5). Let (V(6', •), R(6', •)) be the solutions of (4.1) 
with parameter and initial value (1,-1). The data were simulated from 
the model 

Yu = V{eo,T,) + eu, 

(4.2) 

Y2i — R(^Oi Ti) + e2i, 

where the samples were taken at times 0.0, 0.05, 0.10, 20.0 and eii,e2i 
were independent random variables with the same distributions A^(0,0.5). 
We firstly plot the solutions for ^.nd 9, and the simulated data in Figure 
1. Now, we fix the sample and let L be the cubic spline functions with 





Fig. 1. The solutions (V,R) of (11) for 6o = (0.2,0.2,3) and f = (0.8, -0.5, 3.5), and 
the simulated data (1^1,^2) from (4-2). 





2 4 6 8 10 12 14 16 18 

X=1.6x10' 




2 4 6 8 10 12 14 16 18 20 



iil .045x10^ 

Fig. 2. The differences between the spline approximations (Xi(t), X2(t)) and the solution 
(V(t),R(t)) for 6 = (0.8,-0.5,3.5) for different values of A. The graphs have different 
y-axis scales. 



knots at each sample point. Let the smoothing parameter A go to infinity. 
We plot the differences between the spline approximations (Xi(t), X2(t)) 
and the solutions (V(t),R(t)) for 9 = (0.8,-0.5,3.5) for different values of 
A in Figure 2. Prom this figure, we can see that the approximations to the 
solutions are quite good when A is large and there is no obvious time trend 
for the difference. 

Fix the sample (li, Ti), . . . , (Yn,, T„) and the parameter 9*. Let L be a 
finite-dimensional linear subspace of C^[0,T]. Define 



(4.3) 



r = max< inf 

^weL,w(0)= 



w 



V 



dx 



•) 
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inf 

veL,v(0)=2 



V 



dz dv 



Let A("^) be a sequence of positive smoothing parameters which is strictly 
increasing and go to infinity. For each m, let 

(xM^zM)g argmax - A^"^) J(i, z, ^)]. 

x,zeL 

x;(0)=a;,z(0)=2 

Note that we suppress the subscript n on (x(™)(r,t),z(™)(r,t)), r and L, 
which may depend on the sample, because the sample is fixed. 

Lemma 2. For each m, we have 

(x(™) ) - Hn ) > a(™) [ J(x(™) , z^'") , 0) - J(i('"+^) , z(™+i) , e)] . 

Therefore, both {i?„(i(™)) : m > 1} and {J(x('"),z(™),6') :m > 1} are de- 
creasing sequences. 

Proof. By the definitions of (x('"), z('")) and (x('"+i),j('"+i)), we have 

Hn (x(™) ) - A(™+i) J(x('") , z(™) , e) < i/„(x('^+i) ) - A("*+^) J(i(™+i) , z(™+i) , 0), 

- A('") J(x('"),z('^\0) > i/„,(x('^+i)) - A("*) J(i(""+^\z(™+i\e). 

The two inequalities in lemma follow immediately. Note that A*-"^^^^ > A*-™'^ , 
so we have 



(m+l) -(m+1) 



>0. 



□ 



Lemma 3. 



lim J(i('"),z('"),e) 



inf J(x, z,0). 

X,Z€:L 

x(0)=x,z(0)=2 



Proof. By Lemma 2, { J(x("') , z^"^) ,9):m>l} is a decreasing sequence 
and nonnegative, hence the limit exists. Now suppose that the equality is 
not true. Then we can find (x,z) E L with x(0) = x,z{0) = z, and a number 
r] > such that x(0) = x 

lim J(i('"),z(™),0)> J(i,z,0)+r7. 

m— >oo 

For any m, we have 

F„(x(™)) - A^"^) J(x('"),z('"),0) > Hn{±) - A("^) J(x,z,0). 
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Then 

Note that Hn{-) is nonpositive, we have 

-Hn{±) > A(")?7. 

The left-hand side is a fixed number and the right-hand side goes to infinity 
as m — 7- oo. This is a contradiction. Hence, the lemma is true. □ 

Lemma 4. // both x^™) and z^*") are bounded sequences in (L, || • ||oo)- 
Then for any subsequence {(x^™ \z^"^ ^)} C {(x('"),z('"))}, there exist a fur- 
ther subsequence {(x(™"),z(™"))} C {(i^*"'), z^™'))} and 

(x(oo)^-(oo)^ ^ argmin J(i,z,e), 

x,z£L 
x(0)=x,z,=z 

such that 

lim ||i("^")-x(~)|L =0, lim ||z('"")-z(~)|L =0. 



loo 

m"— >-oo -m"— >oo 



Proof. Because (L, || • ||oo) is a finite-dimensional subspace of (C"^[0,T], 
II • lloo), any bounded and closed subset of (L, || • ||oo) is compact. Then our 
conclusion follows from the fact that J{x,z,6) is a continuous function of 
(x,z) in (L, II • lloo) (note that it is not continuous in (C-'-[0,T], || • ||oo))- D 

Lemma 5. Suppose that the equations in (1-1) have unique solutions 
(x, z). For any M > and (5 > 0, there exists a positive number e depending 
on M and 5, such that if r < e, then for any 

(x(°°),z(°°)) G argmin J(x,z,^), 

x,zGL,x(0)=x,z=z 

lix||oo<A/,||z||oo<M 

we have 

||x(-)-x|L<5, ||z(-)-z|L<5. 



Now, we study the minimum points of J in a neighborhood of the so- 
lutions. We can only give the result for the one-dimensional case and we 
need some properties of the basis functions. So we will assume in the next 
theorem that the subspace L is the space of B-spline functions with order 
at least 4. Let r = (0 = ti < • • • < tfc„ = T) be the knots of L. Recall that 

|r| = max \ti — ti-i\. 

2<i<k 
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We define the mesh ratio 

max2<i<fc|ti — ti-i\ 



min2<i<fc|ti — min2<i<fc|ti — ti-i\ 

Suppose that x is the solution of the equation 



(4.4) 



^(t) = F(x(t),t), 



x(0) = a;, 



where i*" is a function on M x R. We use Fx-, Ft to denote the partial deriva- 
tives of F with respect to respectively. Define 



J(x) 



fix 
'dt 



{t)-Fi±it),t) 



dt 



VxGL 



and 
(4.5) 



r = inf 

weL,w(0)=x 



X — w 



V 



dx 


dw 




'dt 


~ ~dt 


OO- 



Theorem 4.1. Assume that x is the unique solution of (4-4)' F has 
third- order continuous partial derivatives and 

F^(x(t),t)<0 vo<t<r. 

Suppose that L is the cubic spline space with knots r and |t| < 1. Then 
there exists a positive number S2 depending only on x and F. For any 

xq G argmin J(x), 

xeL,x(0)=a; 
||x-x||,3o<52 



we have 



xol 



d xo 



df^ 



r|^/2 + /32'«|r| 



L2[o,T] 

(4.6) +K(4^/6^ + /33)/34\/r|T|3/2 

+ /?5\/r|r|3 + /36\/T|r|^/^ 
where Pi, f32, f^s, (3^, 13^, I3q are constants depending only on x and F. 



Remark 10. 

(1) Theorem 4.1 is true for any space of the B-spline functions with order 
larger than 4. However, the order of the bound on the right-hand side of 
(4.6) may be different for higher order B-spline functions. We conjecture 
that there are similar results for high-dimensional cases and more general 
equations. 
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(2) If |r| is small enough, the bound is dominated by the first term which 
depends on T only through ||^p*^||l2[o,t]- If is bounded by some 
fixed constant, then we can prove that the first term will be 0(|t|) and 
does not depend on T. 

(3) Now, we can explain the pattern in Figure 2. According to Lemma 5, 
when |t| is small enough, all the minimum points of J will be in the 
neighborhood of the solution (x,z). By Theorem 4.1, all these minimum 
points will satisfy the bound (4.6). By Lemma 4, for any subsequence 
of {x^"^\z^"^^), there exist a further subsequence converging to one of 
these minimum points. Hence, it is easy to show that (x^™'^ z^"*^) will 
satisfy the bound (4.6) for all large m. 

(4) Here we outline the proof of this theorem. The first step is to derive 
a bound for ||x(to) — xo(to)|| at any stationary point to of the function 
x(t) - xo(t), that is, any point to such that ^(*o) - ^(^o) = 0. In this 
step, we use the property of xq as a local minimizer of J(x). One of 
the key points in this step is based on the following observation: for any 
continuously differentiable function / in [0,T], if there is a zero point of 
/ between a and b, then we have 




f{t)dt = 0{\b-a\'^). 



We apply this observation to — In high-dimensional cases, the 
stationary points for different components of x — xq may not be the same. 
We cannot extend the idea to high-dimensional cases. In this step, we 
just need the condition that Fx{x{t),t) is nonzero and do not require 
that it be strictly negative. 

The next step is to give a bound for all the other points in [0,T]. In 
this step, we need the condition that Fx{x{t),t) is strictly negative to 
control the growth of ||x(t) — xo(i)|| when t gets away from a stationary 
point. 

(5) In the proof, we use some properties of the B-spline bases. For example, 
the B-spline bases are stable bases and locally supported. Because we 
need to change the order of the differentiation and the integral in the 
proof, our proof cannot be applied to the penalty where L^- or L°°-norm 
is used. 

In Section 2, we prove the consistency and asymptotic normality based 
on the following idea, the likelihood functions can be well approximated if 
we have good approximations to the solutions of ODEs. Therefore, in prac- 
tical applications, we should make the approximations close enough to the 
solutions. In some cases, if we do not want to solve the ODEs, we can use 
Theorem 4.1 to estimate the deviation of the spline approximations from so- 
lutions. We select a basis that is sufficiently rich to make the uniform norm 
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in Theorem 4.1 very small. For any given A and 6* , if we want to estimate 
||x(0*, A, •) — x(^*, •)||, pick a sequence A„ — )• oo. According to Lemma 4, we 
can find a subsequence A„/ and x(0*,cxd,-) such that lim„/_j.oo x(6'*, A„/, •) = 
x(0*,oo, •). By Theorem 4.1, we can use ||x(0*, A, •) —x{9*,oo, •)|| to approx- 
imate ||x(^*,A,-) — x{6* ,-)\\. In our simulation study, usually the sequence 
n, •) converges, and there is no need to find the subsequence. 



5. Proofs. 



Proof of Theorem 3.1. Given two compact sets 6o and Tq. Without 
loss of generality, we assume that Qq and Fq are convex and contain 
and {xq,zq), otherwise we can prove the conclusion for their convex hulls 
generated by Go U {^o} and Fq U {{xo,zq)} which are still compact sets. Let 
r„ be the number defined in (3.2). Since r„ — )• 0, without loss of generality, 
we assume that r„ < 1. By the definition of r„, for each 0* G Gq x Fq, there 
exist •Wn{0*,-),Vn{0*,-) G L„, such that 

dx dwn 

Tt^^ ' 



|x(r,. 



V 



(5.1) 



r,.) 



V 



dt 
dt 



< 2rr, 



<2rr, 



and Wn{0*,0) = x{0*,O) = x, v„(6l*,0) = z(r,0) = z, where 9* = {0,x,z). 
Because (x{6* ,t),z{0* ,t)) are continuous functions of {6*,t) and Go x Fq x 
[0,r] is a compact set, there exists a positive number R depending on F,G 
and Qo,Fo, such that 

(5.2) \x{9*,t)\<R, \z{9*,t)\<R V(r , t) G Qq x Fq x [0, T]. 
By (5.1), 

|w„(r,t)| <i? + 2r„<i? + 2. 



(5.3) 



t)\ <R + 2rn<R + 2, 
V(r,t)GQoxFox [0,T]. 



Since F, G have continuous partial derivatives, we can find a positive 
number K depending on F,G and Qo,Fo, such that 

\F{x,z,t,e)- F{x' ,z ,t,9)\ <K\x-x'\+K\z- z'\, 

(5.4) \G{x,z,t,9)-G{x ,z ,t,9)\<K\x-x\+K\z- z\ 

\/\x\ <R + 2,,\x'\ <R + 2,,\z\ <fi + 3, |z'| <i? + 3. 

We first prove a technical lemma. 



Lemma 6. s\ive*(.eo^To^n9{y,^n{0\-)) = Op{l) . 
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Proof. By (5.3), w„(0*,t) are uniformly bounded hy R + 2 for all 
(r,t) G Go X To X [0,r]. So if Yi is bounded, supg.geoxro Pnff(l^, w„(r , •)) 
is bounded since g is continuous, and hence the lemma is true. Otherwise, 
by Assumption 4, we can find a positive number 5 > such that 

^ sup 5(y,x)<(l+ inf g{y,x)) Vy G R. 



\x\<R+2 

Hence, we have 



|x|<_R+2 



1 



sup P„5(y,w„(r,-)) < -{l + Fng{YM0o,-))- 
By the law of large numbers. 



1 



(l + P„g(y,x(eSr))) = Op(l). 



□ 



Now, we prove Theorem 3.1. For any 0* & Qq x Tq, by the definition of 
(6**, •)), we have 

i7„(x„(r , •)) - A„J(x„(r , •),z„,(e*, ^) 

>i/„,(w„(r,-))-A„j(w„(r,-),v„(r,-),^). 

Because Hn{'x.n{0* , •)) < 0) 

-A„ j(x„(r , •), z„(r , •), 0) > i^„(w„(r , •)) - A„ j(w„(r , •), v„(r , o,^). 

Then 
J(x„(6'*, •),z^ 



<-— i?„(w„,(r,-)) + J(w„(r,-),v„(r,-),^) 



<^p„5(^,w„(r,-)) + 

An 



dt 



(r,.)-F(w„(r,.),v„(r,.),t,0) 



L2[0,T] 



(r,.)-G(w„(r,.),v„,(r,.),t,e) 



L2[0,T] 



1 
An 



+ 



¥ng{Y,^n{e*,-)) 



dWr, 



dt 



+ F(x(r,.),z(r,-),t,^)-F(wn(r,.),v„(r,.),t,e) 



L2[o,r] 
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+ 



dt 



dz 



+ G(x(r,.),z(0*,-),t,^)-G'(w„(r,.),v„(r, 



L2[0,T] 

(x, z are solutions) 



<— p„5(>^,w„(r,-)) + 2 



c^w„ dx 



2 

L2[0,T] 



+ 2||F(x(r,-),z(r,.),t,e)-F(w„(r,-),v^„(r,-),t,^)|li2[o,T] 
+ 2||G(x(r,.),z(r,-),t,^)-G(w„(r,-),v„(r,-),t,e)|li2[o,r] 

2 



<^p„<7(y,w„(r,-)) + 2 



dt^ 



+ 2 



^Vn..* . dz 



dt 

2 



L2[o,r] 



L2[o,T] 



+ 8i^"||v„( 



<0*r)\\h[o,T] by (5.4) 



<— p„<7(y,w„(r,-)) + 2r 



dWr, 



+ 2T 



2 



dx 



+ 8K2r||w„(r,-)-x(r,- 



I oo 



+ 8i^2T||v„(r,.)-z(r,-)||L by (1.2) 

<^p„r7(y,w„(r,-)) 



+ (Si^^ + 2)T 

+ (8K^ + 2)r 



dt ' ' 



dx 
dz 



n 2 



V 



•)-x( 



v||v„(r,-)-z(r,-)||c 



1 



< -^P„r7(y,w„(r,-)) +8r(8K2 + 2)r2 by (5.1) 
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sup j(x„(r,-),z„(r,-),^) 

6»*e0oxro 

= ^Opil)+8T{8K'' + 2)rl 

An 



Or 



1 
An 



+ 8T{8K'^ + 2)rl by Lemma 6. 



By definition of J, we liave 



sup 

6»*eeoxro 



dt 



1 



F(Xn(r,-),Zn(r,-),t,e) 



L2[0,T] 



<Oply] +8T{8K' + 2)r 



2 ^9V2 



sup 

©•eeoxTo 



^(r,.)-G(xn(r,.),zn(r, 



1 



L2[0,T] 



<Op( — )+8T(8i^2^2)r2. 

An, 



Therefore, by (1.3), 



sup 

e*eeoxro 



x„(r , t) - X - r F(xn(r , s), zn(r , s), ds 
Jo 



(5.5) 



< Op (^-i=^ + T^8(8K2 + 2)r 



and similarly. 



(5.6) 



Define 



(5.7) 



sup 

6»*Geoxro 



<0r 



An{e*,t) 



Zrr{e*,t)-Z- [ G{±n{e*,s),Zn{e*,s),S,e)ds 
JO 



^ -AVT + Ty^8(8K^+2)rn. 



i^n{0*,t)-X- r F{iCn{9*,s),Zn{e*,s),S,e)ds, 

Jo 

Bn {e* ,t)=ir^{9*,t)-Z- f G(i„ [e* , S) , in {9*,s),S, 9) ds. 

Jo 
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We will show that if supg^ge^j^ro Pn(^*r)lloo and sup0.ge„xro \\Bn{0* , ■)\\oo 
are small enough, then |x„(^*, t)| < ii + 3 and |z„(0*,t)| < R + 3 for all 
t G [0,r], r e Go X To. Because {^{6* , ■) , z{9* , ■)) are solutions of (1.1) with 
initial values {x,z), by integrating the equations (1.1), we have 

x(r,t)-3;- / F(x(r,s),z(r,s),s,0)ds = O, 
Jo 

z{d*,t)-z- I G{^{e*,s),z{e*,s),s,d)ds = 0. 
Jo 

Then we subtract them from (5.7), 

^„(r,t) = [x„(r,t)-x(r,t)] 

- AF(x„(r , s), z„(r , s), s, 9) - F{^{e*,s),z{6*,s),s, 6)] ds, 
Jo 

- [\G{±n{0*,s),in{0*,s),s, 9) - G(x(r , s), z(r , s), s, d)] ds. 
Jo 

So 

|x„(r,t)-x(r,t)| 

< r \F{±n{e*,s),in{o*,s),s, 6) - F(x(r , s), z(r , s), s, e)\ ds 

Jo 

+ \An{e\t)i 
\zn{e\t)-z{e\t)\ 

< f \G{±n{0\s),Zn{e\s),s, 9) - G{^{9* , s) , z{0* , s) , s , e)\ ds 
Jo 

+ \Bn{e*,t)\. 

Define rg. = [inf{t > 0, |x„(r , t)| >R + 3, or |z„(r,t)| >R + 3}]AT, where 
for any a, 6 G M, a A 6 = min(a, b). Then for any < s < rg* , and 9* & Qq xTq, 
\xn{9*,s)\ < R + 3,\zn{9* , s)\ <R + 3. By applying (5.4), we have for any 
0<t<Te*, 

|x„(r,t)-x(r,t)| 

< r |F(x„(r , s), z„(r , s), s, 9) - F{^{9*,s),z{9*,s),s, 9)\ ds 
Jo 

+ \An{9*,t)\ 

< f |F(x„(r , s), z„(r , s), s, 9) - F{^{9*,s),z{9*,s),s, 9)\ ds 
Jo 
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+ sup Pn(r,-)||c 

e*e0oxro 



<K [ |x„(r,s) -x(r,s)|cis 
Jo 

+ K [ \in{9*,s)-z{e*,s)\ds 
Jo 



+ sup p„(r,.)||c 

6»*6eoxro 



Similarly, 

|z„(r,t)-z(r,t)| 



<K [ |x„(r,s) -x(r,s)|ds + K /" \zn{e*,s) -z{e*,s)\ds 
Jo Jo 

+ sup ||s„(r,-)||oo. 

e*e0oxro 

We add the above two inequalities together 

|x„(r,t) -x(r,t)| + \zn{e\t)-z{6*,t)\ 

<2K [ [|x„(r , s) - x(r , s)| + \in{e*,s) - z(r , ds 

Jo 

+ sup p„(r,-)lloo+ sup ||s„(r,-)||oo- 

6»*eeoxro 6i*60oxro 
It follows from Gronwall's inequality that 

|x„(r,t) -x(r,t)| + \zn{e*,t)-z{9*,t)\ 
< 



(5.8) 



< 



sup p„(r,-)lloo+ sup ||fi„(r,-)||oo 

6i*e0oxro e*eeoxro 

sup p„(r,-)||oo+ sup ||S„(r,-)||oo 
6i*e0oxro 6i*e0oxro 



^2KT 



vo<t<Te*,r eeoxTo. 



By (5.5), (5.6) and (5.7), as n— )-oo, we have 



sup p„(r,-)||oo+ sup ||s„(r,-)||oo^o. 

e*e©oxro 6»*e©oxro 
Hence, when n is large enough, we have 

|x„(r,t) -x(r,t)| + |z„(r,t) - z{e*,t)\ < i. 

By (5.2), \xnie*,t)\ < |x(r,t)| + l < ii+l<i? + 3and |z„((9*,t)| < |z(6'*,t)| + 
1 < + 1< + 3 for alU G [0,Te*],e* £ Qq x Tq. By the definition of Tg* , 
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we must have re* = T for all 9* e Oq x Tq. By (5.5), (5.6), (5.7) and (5.8), 
we have 



sup ||x„(r,-)-x(r,-)||oo< 

'eeoxTo 
□ 



Op(^-^^Vr+2rV8(8A'2 + 2)r„ 



^2KT 



Proof of Theorem 3.2. For any two given compact sets Go C G and 
To C r, first, we show 



(5.9) 



sup \Hn{±n{0*,-))-Hr,M9*,-))\ 

9*eeoxro 



e2^^0p(l)+o, 



Op ^ + 2rV8(8K2 + 2)r, 

Second, define 

1 " 

M„(r) = HnMO\-)) = —Y^giYiMO^Ti)), e*eQxT. 

n ^ — ^ 



i=l 



We show that 
(5.10) 



sup \Mn{e*)-M{e*)\=op{i). 

6»*eeoxro 



Note that M{e*) = -E0*^[g{Yi,y.{e* ,Ti))]. Finally, we show that 

~^ ^0 ™- probability. 

Now let us prove (5.9) and (5.10). Without loss of generality, we assume 
that Go and Fq are convex and contain and {xq,zq). According to As- 
sumption 2, {x{9* ,t),z{9* ,t)) are continuous functions of {0*,t). Because 
Go X Fo X [0,T] is a compact set, there exists a positive number R depend- 
ing on Go and Fo, such that 

|x(r,t)|<i?, \z{6*,t)\<R V(r,t) G Go X Fo X [o,r]. 
Define K = supg*g0,,xro ll^ri(6'*, •) - x(6'*, •)||oo. From Theorem 3.1, 



(5.11) Vn < 



Or. 



'\r. 



r + 2T^8(8i^2 + 2)r„ 



:Op(l). 



By Assumption 5, there exists a positive number such that 



(5.12) sup 

|x|<R+l 



dg_ 

dx 



< 7 ( 1 + inf 

V \x\<R+l 



dx 
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So 



\Hn{in{0\-))-Hn{^{e*,-))\ 

n 

<-^\giYi,in{e*,Ti))-giY,MO*,Ti))\ 



i=l 



1 " 

<-Y,\9{Yi,in{0*,Ti))-g{Y,M0*,Ti))\l[v„<i] 
1 " 

+ -^|g(y„i„(r,r,))-g(y„x(r,T,))|i[y„>i] 



i=l 



1 " 

n ^ 

1=1 



sup 

.|a:|<_R+l 



|x„(r,.)-x(0*,.)||oolry„<i 



1 " 

+ -Y.\9{y^,M0*,Ti))-g{Y,M0*,Ti))\^y,^:,^ 



i=l 

n 

<-y- 

i=l 



1+ inf 

|x|<_R+l 



dx 



Kl[y„<i] by (5.12) 



1 " 

+ -J2\9iY^,M9*,Ti))-g{Y,M0*,Ti))\^y^^:,^ 



1=1 



1 " 1 
~ n ^ 5 

i=l 



1 + 



dx 



{Y,,^{9*o,Ti)) 



1 " 

+ -Y.\9iy^,M0*,m-g{Y,M0*,Ti))\l[y^:,^. 



i=l 



sup |F„(i„(r,-))-i/n(x(r,-))| 

e*eeoxro 



(5.13) 



6 

i=l 



1 + 



dg_ 

dx 



iY„x{e*o,Ti)) 



Vr,. 



sup 

6i*e0oxro 



1 " 

-^l5(i^,in(r,T,))-5(i*,x(r,r,))| 



1=1 



4V„>i]- 
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By the law of large numbers and Assumption 5, 



n 

-E 



i=l ^ 



1 + 



1 + Eg* 



< oo 



in probability, so it is Op{l). From (5.11), the second term on the right-hand 
side of (5.13) is not zero only in the event [Vn > 1] whose probability goes 
to zero, so it is Op(^). Equation (5.9) has been proved. The equality (5.10) 
follows from the lemma below. 

Lemma 7. The two classes {x{6* ,■), 6* G 6o x Tq} and{g{-,x{0* ,-)),9* G 
Go X Tq} are both Pg* -Glivenko-Cantelli. 



Proof. For any 6*', 9*" G Gq x To, let 6*' = {e',x',z') and 9*" = {9",x", 
z"). Since ©q x Fq is convex, by Taylor expansion, we have 



|x(r',t)-x(r",t)| < 



sup 

9*eeoxro,o<s<T 



+ 



sup 

*6eoxro,o<s<r 



sup 

?*eeoxro,o<s<T 



9x^ 

dx 



•X 



{9*,s] 



I / //I 

\x — X \ 



\z'-z"\ 



By Assumption 2, ^ are all continuous functions of {9*,t). There- 

fore, they are all bounded in the compact set 0o x Fq x [0,T]. We can find 
a positive constant C, such that 

|x(r',t)-x(r",t)| <c|r'-r"| 

ve* gGq X Fo,0<t <r. 

That is, the class {x{9*,-),9* G 0o x Fq} is Lipschitz in Qq x Fq. It follows 
from the Theorem 2.7.11 in the book of van der Vaart and Wellner [26] 
that the Li(Pe*)-bracketing number is bounded by the covering number 
A^(e,Go X Fo, | • |) of Gq x Fo, where | • | is the Euclidean distance. Because 
Go X Fq is a bounded subset in 1^*^+^, 



N{e, Go X Fo, I • I) < constant x ( 



d+2 



Then our lemma follows from the Theorem 2.4.1 in the book of van der 
Vaart and Wellner [26]. Similar to the derivation of (5.13), we can get 

|5(y,x(r',t))-5(y,x(r",t))| 
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(5.14) 



1 

< - 

- 6 

C 

< — 

- 6 



1 + 
1 + 



X 



,t)-x(r",t)| 



By Assumption 5, |^(j/,x(0Q,t))| has a finite expectation. Hence, by the 



same argument for x( 



we can get the conclusion for g{-,x{6* , •)). □ 



Because 0* is uniformly tight, for any e > 0, there exist compact sets Go 
and To such that 

(5.15) Pe*{U)>l-e, 
where 

(5.16) n = G Go X To for all n]. 

Without loss of generality, we assume that Go and To contain ^o and {xq,zq). 
Let 7] be any positive number. By Assumption 3, is the unique maximum 
point of M{9*), hence 

(5.17) ^ = M{9*q)- sup M{e*)>0 

6l*e0oxro,|6»*-6»5|>r) 

due to the compactness of Go x Tq and the continuity of M. Because for 
G Go X To, 

M{9*o)-M{9*J 

= Mn{ei) - MnK) + Op(l) by (5.10) 

= i?„(x(0S,.))-i/n(x(C,-))+Op(l) 

<Hn{Mel-))+ sup \Hn{±n{9\-))-Hn{x{e\-))\ 

e*eeoxro 

-i?„(x„(^:,.)) 

+ sup \Hn{-kn{9\-))-Hn{A0\-))\+Op{l) 

e*eeoxro 

<Hn{±n{Ol,-))-Hn{±nK,-)) 

+ 2 sup |if„(x„(r,-))-i^n(x(r,-))|+op(i) 

6»*e0oxr(, 

< Hn{S^n{Ol, ■)) - Hn{^n{0n, O) + Op{l) by (5.9). 

By definition of 0* , we have 
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and then M{ei) - M{e*J < Op(l). By (5.16) and (5.17), we have 

[K - ^Sl > ^] n n c [M{0*) - M{0*j > 7] n n c [0^(1) > 7] n n. 

Now 

PeM-^o\>v] 

<Pei,{[K-e*o\>r,]nU) + Pe^^m 

< (K(l) > 7] n n) + Pe* m < Pe* K(l) > 7] + e by (5.15). 
We have 

limsnpPe^^[K-e*o\>r]]<e. 

n— >oo 

Since e is arbitrary, we have Pq* — ^qI > ^] ~^ 

Proof of Theorem 3.3. For any convex compact sets 60 C 6 and 
Fq C r such that 6q is an interior point of ©o ^ Tq, let 0* be the maximizer 
of if„(x„(0*, •)) in ©0 X Fq. Then the event 

(5.18) [0~;/^;] = [^;^eoxFo]. 

We will prove the asymptotic normality for 0* by using Theorem 5.23 in 
the book of van der Vaart [25]. First, by (5.14), for any 6*', 6*" G ©0 x Fq. 

(5.19) 



- 6 



1 + 



where C is a constant depending on ©0, Fq and T. 6 is the constant in 
(5.12). By Assumption 6, 1 + |^(?/,x(^Q,t))| has a finite second moment. 
Second, we will prove a Taylor expansion for 

M{e*) = -Ee^Jg{YiMO*,Ti))], 

in the neighborhood of ^q- We expand 

9{yM0*,t)) 



(5.20) 



(5.21) 



dg 



+ 



dx 



T 



(w^(^o,t)] (9* -9*0) 



+ 



1 



dx 



89* 89*' 



X {e*-9l) + Ro, 
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where {9* — 9^)'^ denotes the transpose of {9* — 9q) and Rq is the remainder 
term. If define a continuous matrix 

D{y,t,9*) = |^(y,x(r,t)) -^^^{9*,t) 
\y, , J dx^ ^ " d9*d9*^ 

we can express the remainder term by an integral 

-I 



Ro = (9* - 91Y 



f [D{y, t, 91 + s{9* -91))- D{y, t, 0*)] (1 - s) ds 
Jo 



By using the same argument in the proof of Lemma 6, we have 



\D{y,t,9*)\< 



1 + 



dx 



+ 



dx"^ 



y9* G Go X To, 



where C is a constant depending on 0o, Fq and T. The right-hand side of 
the last inequality has a finite expectation by Assumption 6. Hence, by the 
dominated convergence theorem, 
fi 



[ [D{y, t, 91 + s{9* -91))- D{y, t, 9*^)] (1 - s) ds 
Jo 







as 9* ^ 9q. From (5.20), we have 

(5.22) M{9*) = M{91) + {9* - 9l)^Ve*{9* - 9^) + o{\\9* - 9lf), 
where 

Ve* = -Ee'^\D{y,t,9l)] 

and there is no linear term of 9* — 9^ beacuse 9^ is the maximum point of 
M. 

Finally, we have 

>Hn{MGnr))- sup \Hn{M9*,-))-HnM9*,-))\ 

= sup Hn{-Xn{9* , ■)) 

e*eeoxro 

- sup |F„(x„(r,-))-i^n(x„(r,-))| 

6»*eeoxro 



(5.23) 



by definition of ^* 
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> sup [Hni^ie*,-))- sup \Hn{^{6*,-))-Hn{^n{e*,-m 
e'eeoxTo e^eeoxro 

- sup |/f„(x„(r,-))-/^n(Xn(r,-))| 

e*e0oxro 

> sup i7„(x(r,-))-2 sup |i7„(x(r,-))-i^n(xn(r,-))|. 

6»*60oxro 6»*eeoxro 



By (5.9) and 



we have 



An , 
^ — )• oo and r„ 



as n — )• oo, 



sup \Hn{Me*,-))-Hn{^{e*,-))\ 

e*eeoxro 



Or. 



1 



r + 2r V8(8i^2 + 2) 



n 



77, 



Now if we look x Tq as the parameter space, then by (5.19), (5.22) and 
(5.23), it follows from Theorem 5.23 in book of van der Vaart [25] that 
-y/n(0* — ^o) asymptotically normal with mean zero and covariance matrix 



(5.24) V,-^'Eer, 



dg_ 

dx 



d: 



X 



09* 



T 



X 



d. 

89 



Note the asymptotic covariance matrix does not depend on Qq xTq. Because 
0*'s are tight and by (5.18), we can make 

supP,* [9: + 9l\ = snpPe* K ^ Go x Tq] 

n n 

arbitrarily small by taking large Go x Fq. It follows the lemma below that 
\/n[9^ — 9q) is asymptotically normal with mean zero and covariance ma- 
trix (5.24). Similarly, we can prove the asymptotic normality of ^* with the 
same asymptotic covariance matrix. □ 

Lemma 8. Let {Xn-n = 1,2,...} be a sequence of random variables. 

For each m = 1,2, . . . , there is a sequence of random variables {X^i^^ : n = 
1,2,.. .} such that for any e > 0, 77;e have 

lim supP(|X„ - Xi"')| > e) = 0. 

Suppose that for each m, the sequence {X^^ : = 1, 2, . . .} converges weakly 
to the same random variable X which does not depend on m. Then {X„ : n = 
1,2, . . .} converges weakly to X. 
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Proof. We calculate the characteristic function of {X„:n = 1,2,...}. 
Fix u£R. 

Because e*"* is a continuous function of t and bounded by 2, for any 6 > 0, 
there exists e > such that 

|e*"*-l|<5 V|t|<e. 

We have 

< i<;|e™("^""'^"'"') — 1| 

<2P(|X„-X('")|>e) + <5. 
Hence, 

lim sup|£;(e^"^") - £;(e™^"'"')| < (5. 



Since 6 is arbitrary, we have 

lim sup|^(e^"^") - S(e*"^"'"')| = 0. 



Because for each m, the sequence {X^^ : n = 1, 2, . . .} converges weakly to 

X, ^(6*"^"™') ^^(e*"-^). So we have ^;(e*"-^") ^^(e*"-^), that is, {X„:n = 
1,2,...} converges weakly to X. □ 

Proof of Lemma 5. If the lemma is wrong, then there exist M > 0, 
5 > and a sequence {Lg :q> 1} with — t- 0, such that for each q, there 
exist 

(wq,Vg)e argmin J{x,z,9) 

X,Z6Lq,x(0)=X,Z = 2 
l|x||oo<M,|!z|icx)<M 

with 

(5.25) ||wg-x||oo>5 or ||vg - z||oo > 5. 

We will show that {(wg, Vg) :q>l} are equicontinuous. Fix any ry > 0. Let 
be a positive constant such that 

\F{x,z,t,e)\<K, \G{x,z,t,9)\<K y\x\,\z\ < M,t e [0,T] 



GENERAL PROFILING ASYMPTOTICS 



39 



ForanytoG[0,r], 













< 


f 








ho 





to 



ds 



< 



< 



to 



dt 
dw. 



(S) - F(Wg, V,, t, 9) + F(Wg, Vg, t, 6) 



ds 



dt 



{s)-F{wg,Vg,t,e) 



ds + K\t-to\ 



to 



dWn 



dt 



(s) - F{Wg,Vq,t,, 



ds + K\t-to\ 



< ^ J{^q,Wg,e)+K\t-tQ\. 

By the similar argument as in the proof of Theorem 3.1, we have 

j(Wg,Vg,6') ^0, 

because Vq — )• 0. Therefore, there exists go such that if q > J{'^q^^q^(^) < 
(^)^. So for any q>qo and |i — io| < we have 

Hence, {wg :<?>!} are equicontinuous. Similarly, {vg :q>l} are equicontin- 
uous. Then by Ascoli's theorem there is a uniformly convergent subsequence. 
Without loss of generality, we assume that — t- xq and — t- zq uniformly. 
We will show that (xo,zo) are also the solutions of (1.1). 

ft 



to 



dw„ 



dt 



is) - F{wg,^rg,t,e) 



ds< J J{Wg,Vq,9) -^-0. 



Hence, ^ 



F{xo,zo,t,e) in L'^[0,T]. Then we have 

Wg-^x+ / F{xo{s),zo{s),s,9)ds 
Jo 



uniformly, that is, 



Similarly 



xo(t)=x+/ F{xo{s),zo{s),s,0)ds. 
Jo 

zo{t) = z+ G{xo{s),zo{s),s,9)ds. 
Jo 



(xo,zo) are the solutions of (1.1). By the uniqueness, (x, z) = (xo,zo), so 
{wq,Vg) — )• (xo,zo) uniformly. This contradicts (5.25). □ 
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Proof of Theorem 4.1. Let r = (0 = ti < • • • < = T). Then the 
dimension of L is / + 3. Let {<j)j :j = —2, —1,0, ... Z} be the B-spline bases. 
For each 1 < j <l — 3, (pj is a piecewise polynomial of order 4 and vanishes 
outside the interval (tj, tj+4). All the bases are bounded by 1. For any x E L, 
let 



i=-2 



'J' 



where (cj, j = —2, ...,/ + 1) are coefficients. 

Because F has the third-order continuous partial derivatives, x has a 
continuous fourth derivative. Let 



R 



dt 



d^x 



df^ 



and 



d^x 



dt^ 



(5.26) 



We can also find positive number Kq, K and i^i, such that 

\F{x,z,t)\ <Ko, \F{x,t) - F{x',t)\ <K\x-x'\, 

\F^{x,t)\ < Ki, \Ft{x,t)\<Ki y\x\<R+l,\x'\<R+l. 

By definition of r and the proof of Lemma 1, we have 

(5.27) r < max{CoK4|r|^,Cii^4|T|^} = CiK^lrf, 

where Cq = ^ < Ci = Because F^.(x(t),t) < for all < t < T, we 

can find positive numbers 82 < I and 7 such that 



(5.28) 
Let 



Fx{x,t)<-j y\x -x{t)\<52,0<t<T. 

J(x). 



xq G argmm 

x6L,x(0)=x,||x— x|joo<i52 



First, we show: 



Lemma 9. //2r < 62, we have 

J(xo) < 4(4^2 + 2) . 



Proof. By the definition (4.5) of r, there exist w G L with w(0) = x 
such that 
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By (5.26), we have 



--F(w,*) 



dt 



dw dx ^, , ^, 



< 2 



< 2 





dw 


dx 


r 

Jo 


'dt 


~ It 




dw 


dx 


f 

Jo 


'dt 


~ 'dt 



! rT 

dt + 2 \F{x,t)- F{w,t)\^dt 
Jo 

dt + 4/c2 / |x - wl"^ dt < 4:{4:K^ + 2) Tr^ 
Jo 



By the definition of xq, we have 



J(xo) < J(w) = / 
Jo 



dw 



dt < 4(4^2 + 2) Tr^ . 



□ 



Lemma 10. For any te {0 < s <T:^{s) - ^{s) = 0}, 



\x{t) - xo{t)\ < l3iK 



d'^xo 



dt^ 



L2[0,T] 

+ k(4\/6^ + /33)/34\/r|r|3/2 + /3gVr|Tr/2^ 
where /325 /^s, /34, /Se '^'^e constants depending only on x and F. 

Proof. Pick i such that t G (tj,tj+4) C [0,T]. There exists a smah pos- 
itive number rj, such that for any \u\ < r/, u G M, we have 

\\{xo + U(pi) -x||oo <(52. 

By the definition of xq, we have 

^(xo) < J{xo + u4>i), 

that is, is the minimum point of the function J{xQ + u(f)i) of u in the region 
{u : |m| < ?y, m G M}. Therefore, 



= —{xo + u(, 



dt 



u=0 

F^{xo{t),t)(l)^{t) 



dxp 
dt 



F{xo{t),t)]dt 



d(j)i ( dxo , . . , , 
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dt dt 
F(io(i),i) ) dt 



+ F(x(t),t)-F(xo(t),t) 



dt \ dt 



^ , ^ , . N , / N / dicn dx 



f 



F,(xo(t),t)0i(t)(F(x(t),t) - F{x{t),t) 

+ F(xo(t),t)-F(xo(t),t))dt 
F,.(xo(t),t)0i(t)(F(x(t),t) - ^(xo©,*))^*. 



The integrals are from tj to because c^j and vanish outside {tj,tj^4). 
We will estimate every term on the right-hand sides of (5.29). 
First, by the formula (see DeBoor [8]) 

d4>i 3 3 



^1+1,3, 



dt — ti' ' tj_|_4 — 

where (^j^s's are the B-spline bases of order 3 with knots r, we can calculate 



dt \ dt 



F(xo,zo,t) dt 



(5.30) 



< 



< 



dt 



L2[o,T] 

3 



dko 
dt 



F(xo,zo,t) 



L2[0,T] 



+ 



< 



6 



\/j(xo,zo,t) 



iy3minj — 
6 



V3MA 



\/j(xo,zo,t) < 



v/J(xo,zo,i) 
12 



V(4i^2 + 2)rC7ii^4|r| 



< 4^6(4^2 + 2)rKC7iK4|r|^/2_ 



Because ^{t}-^{t) = 0, 



''^"E,(i,,{t),t)Mt)(^it)-^{t)]dt 



dt 



dt 



dt 
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(5.31) < Ki 
= Ki 



■i+4 



ti+4 



dt ^ ^ dt ^ ^ 



d^XQ 



U Jt 

ii + 4 



dt2 



(s) ds 



dt + Ki 
dt + Ki 



ti+4 



dx , . dx ,-. 



dt 



U 



d^XQ 



ti+4 



df^ 

d'^XQ 



df^ 



is) 
is) 



ds dt + Ki 

ti+4 



U Jt 



'fx 

dt^ 

d^x 



(s) ds dt 
dsdt 



dt ds + Ki 



ti4-4 ft 



ti J 



K2 ds dt 



K1K2 , ,9 

{ti+4 -S)ds-\ (ti+4 - ti) 



d^xo 






dt^ 


L2[0,T] 


[/ 

Ut, 



{ti+4 - s)^ ds 



1/2 



^^'^'(t t)' 
H 7> [ti+i — ti) 



< 



< 



d^XQ 



V3 



df^ 
d'^XQ 



L2[o,r] 



dt^ 



(ti+4 — ti)^^"^ H ^— ^(ti+4 — ti)^ 



|r|3/2 + 8KiJf2|rp. 



L2[0,T] 

For the second term from the last on the right-hand side of (5.29), 
'^'F^(xo(t),t)(A^(t)(F(x(t),t)-F(x(t"),t"))dt 

<Ki r^'\F{x{t),t)-F{x{t),t)\dt 
Jti 

= K, [\F^{x{s),s)^ + Ft{x{s),s))ds 



(5.32) 



ti+4 ft 



{KiRi + Ki)dsdt<K 



dt 

^ KiRi + Ki 



{ti+4 — tiY 



'2/r> , iM_|2 



and 



<SKiiRi + l)\T\ 



F^ (io it) , t)^i (t) (F(io (i) , t) - i^(xo it) ,t))dt 
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(5.33) 



<Ki [ |F(xo(t),t) - F{Mt),t)\ dt 
Jti 



<kI 



''F,{M''^),s)^ + Ft{Ms),s)]ds 



ti+4 ft 



ti+4 ft 



dt 



dt 

+ Ki]ds dt 



dt 



dt 



s) 



7^2 

dsdt + ^{ti+4-tif 



(ti+4 — s) dsdt H — —(ti+4 — t. 



■i+4 



dt 



{s)-Fixo{s),s) + F{xo{s),s) 



(ti+4 — s) ds dt 



ti+4, 



dxo 



s) - F{xo{s),s] 



KlKo 



dt 

2 Kf 2 
(ti+4 — ti) +-^{'ti+i—t^ 



(ti+4 — s) ds dt 



< 



dt^ 



i^(xo,' 



L2 [0,T] 



(ti+4 — ti)^^"^ 



Kl 



+ :^(Ko + i){t^+^-tif 



< 



8^2 



V3 



dt2 



i^(xo,- 



L2[o,r] 



^/3 
16K^ 



W>^(xo,zo,t)|r|3/2 + 8i^2(^^ + 1)1^1 



< ^^V(4K2 + 2)rCi/f4|r|9/2 + 8i^2(Ko + l)|rp. 
v3 



Now we calculate the last term 

''ti+4 



[ '^'F,(xo(t),t)(/.,(t)(F(x(t"),t") -F(xo(t"),t"))dt 
Jti 



ti+4 



F^{ko{t),t)<Pi{t)dt \F{x(t),t)-F{koit),t}\ 



(5.34) 
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'^'F^(xo(t),t)0i(t)dt |F^(x',t)(x(t) -xo(t))| 



>7 



-i+4 



(f>i{t)dt 



7|x(t)-io(i)| 



i+4 ~ h| 



|x(t)-io(t)l 



2 1 I 

>^|x(t)-io(t)|, 



where x' is a number between x(t) and xo(^) and we use the formula ( 
equahty (4.29) in Schumaker [23] and Theorem 4.23 in Schumaker [23]) 



From (5.29)-(5.34), we have 



7^|r| 



x(t)-io(i)| 



< 



< 



ti+4 



(io {t),t)^,{t) (F(x(t) , t) - F(io {t) ,t))dt 



ti+4, 



(ixo 



+ 



+ 



+ 



dt V dt 

ti+4 



F{xo{t),t)]dt 



ti 





dx\ 


V dt 





ti+4, 



F^{xo{t),t)Mt){F{x{t),t) - F{x{t),t))dt 



ti+4 



F^ (xo {t),t)^i{t){F{ko {i) ,t)-F{ko{t),t))dt 



< 4^6(47^2 + 2)TKCiKi\Tf/^ 



+ 



8Ki 



^/3 



d^xo 



df^ 



Irf/'^ + 8KiK2\t\ 



L2[o,T] 
+ 8Xf(iil + l)|T|2 



< 



8Ki 



^/3 



fV(4i^2 + 2)rC7ii^4|r|9/2 ^ 8Kf{Ko + l)|r|2 

|^|3/2 



^/3 
d^xo 



dt2 



L2[o,T] 
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+ {8KiK2 + 8Kf{Ri + 1) + 8Kf{Ko + l))|r|^ 



+ 



ml 

V3 



V(4K2 + 2)rCi/^4|T|9/2^ 



Therefore, 



|x(t)-xo(t)| < 



where 



/3i 



/53 

/36 



|r|^/2 + /32K|T| 

L2[0,T] 

+ k(4V6^ + /33)/34\/r|r|3/2 + /3g V^|T^/^ 



8^1 

1 



/32 = ^(8i^iK2 + 8Ki{Rx + 1) + 8i^^(iv:o + 1)), 

T 



8K 



/3- 



1 

7^ 



V(4K2 + 2)Ci/C4, 



V3 



^V(4K2T2)C7i/C4. 



□ 



Let 



€ arg max |x(s) — xnfs) 

0<s<T 



Then either to G (0,r) or to = T. If to G (0,T), we have ^(to) - ^(^0) = 0. 
By Lemma 10, 



|x-xo| 



(5.35) 



</3iK 



x(to) -xo(to)| 



\T\^'^+h^\T\ 

L2[o,r] 



In the case to = let 

So = mf jo< s<r:^(t) - ^(t) 7^ 0, Vs < t < T [> AT. 



If So = T, there is an increasing sequence {sn,n > 1} which converges to T 
dt y^-^J dt 



and — ^{sn) = for ah n, hence we still have inequality (5.35). If 
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So < T, then either sq = or ^(sq) — ^(so) = 0. In both cases, we have 
|x(so) -xo(so)| 



L2[o,T] 

+ A^(4^/6;^ + /33)/34\/r|r|3/2 + /3e Vr|r|7/2. 



Without loss of generahty, we assume that ^{t) - ^{t) > for all t G 
(so,T). Hence, xo{t) — x(t) is increasing in {so,T) and xo(r) — x(r) > 0. 
We have the following two cases, 

• If xo(so) — x(so) > 0, then for any t G (sq, T), 

F(x(t),t) - F{ko{t),t) = F,{x',t){x{t)-Mt)) > 0. 
Now we have 



J(xo) > 



so 
T 



dt 



F{±o{t),t) dt 



> 



> 



T 



( dxQ dx\ 



V dt 



dt 

dt J 



so 



^-^\dt 
dt dt 



So 



|x-xo| 



[(x(r)-xo(r))-(x(so)-xo(so))]^ 



|x(r)-xo(r)| 



< |x(so) - xo(so)| + VJ{^o) 
d^±o 



df^ 



L2[o,T] 



+ k(4\/6^ + /33)/34\/r|T|3/2 + /35\/r|r|3 + /JqItI 



7/2 



where 



/35 = \/(4i^' + 2)CiK^. 



If xo(so) — < 0, then there exists a s' G {so,T) such that xo(s') — 
x(s') = 0. So we can still use the same argument in the first case with the 
lower limits sq of all the integrals replaced by s'. □ 
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